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Abstract 

In topology inference from data, current approaches face two major problems. One concerns the 
selection of a correct parameter to build an appropriate complex on top of the data points; the other 
involves with the typical ‘large’ size of this complex. We address these two issues in the context of 
inferring homology from sample points of a smooth manifold of known dimension sitting in an Euclidean 
space We show that, for a sample size of n points, we can identify a set of 0{n?) points (as 
opposed to ) Voronoi vertices) approximating a subset of the medial axis that suffices to compute 

a distance sandwiched between the well known local feature size and the local weak feature size (in fact, 
the approximating set can be further reduced in size to 0{n)). This distance, called the lean feature 
size, helps pruning the input set at least to the level of local feature size while making the data locally 
uniform. The local uniformity in turn helps in building a complex for homology inference on top of the 
sparsified data without requiring any user-supplied distance threshold. Unlike most topology inference 
results, ours does not require that the input is dense relative to a global feature such as reach or weak 
feature size', instead it can be adaptive with respect to the local feature size. We present some empirical 
evidence in support of our theoretical claims. 


1 Introduction 

In recent years, considerable progress has been made in analyzing data for inferring the topology of a space 
from which the data is sampled. Often this process involves building a complex on top of the data points, and 
then analyzing the complex using various mathematical and computational tools developed in computational 
topology. There are two main issues that need attention to make this approach viable in practice. The first 
one stems from the requirement of choosing appropriate parameters to build the complexes so that the 
provable guarantees align with the computations. The other one arises from the unmanageable ‘size’ of 
the complex—a problem compounded by the fact that the input can be large and usual complexes such as 
Vietoris-Rips built on top of it can be huge in size. 

In this paper, we address both of the above two issues with a technique for data sparsification. The data 
points are assumed to be sampled from a smooth manifold of known dimension sitting in some Euclidean 
space. We sparsify the data so that the resulting set is locally uniform and is still good for homology 
inference. Observe that, with a sample whose density varies with respect to a local feature size (such as the 
Ifs proposed for surface reconstruction ||2|), no global parameter for building an appropriate complex can be 
found. The figure in the next paragraph illustrates this difficulty. 
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For the non-uniformly sampled curve, there is no single radius that can 
be chosen to construct, for example. Rips or Cech complexes. To connect 
points in the sparsely sampled part on right, the radius needs to be bigger 
than the feature size at the small neck in the middle. If chosen, this radius 
destroys the neck in the middle thus creating spurious topology. Our solu¬ 
tion to this problem is a sparsification strategy so that the sample becomes locally uniform ifT^ [T5l while 
guaranteeing that no topological information is lost. The sparsification is carried out without requiring any 
extra parameter and the resulting local uniformity eventually helps constructing the appropriate complex on 
top of the sparsified set without requiring any user supplied parameter. 

The sparsification also addresses the problem of ‘size’ because it produces a sub-sample of the original 
input. The technique of subsampling has been suggested in some of the recent works. The well-known 
witness complex builds on the idea of subsampling the input data by restricting the Delaunay centers on 
the data points Il2ll . Unfortunately, guarantees about topological inference cannot be achieved with witness 
complexes unless some non-trivial modifications are made and parameters are tuned. Sparsified Rips com¬ 
plexes proposed by Sheehy Il20ll also uses subsampling fo summarize fhe fopological information confained 
in a Rips filfrafion (a nesfed sequence). The graph induced complex proposed in ifT^ alleviafes fhe ‘size’ 
problem even furlher by replacing fhe Rips complexes wifh a more sparsified complex. Bofh approaches, 
however, only approximate fhe frue persistence diagram and hence fo infer homology exacfly require a user- 
supplied paramefer fo find fhe ‘sweef spof’ in fhe filfrafion range. Furthermore, none of fhese sparsificafions 
is designed fo work wifh a non-uniform inpuf fhaf is adapfive fo a local as opposed fo a global feafure size. 

Our algorifhm firsf identifies a sef of poinfs fhaf supposedly approximates only a subsef of fhe medial 
axis. If is known fhaf fhe medial axis of a manifold embedded in can be approximated wifh fhe Voronoi 
diagrams of fhe n inpuf sample poinfs JUIUO which requires ) Voronoi vertices in fhe worsf-case. 

In confrasf, we approximafe fhe medial axis only wifh a lean set of O(n^) poinfs (which can be broughf 
down fo 0(n) wifh some more processing as shown in SectionThe disfance fo fhis lean sef which we 
call fhe lean feature size is shown fo be sandwiched befween fhe local feafure size Ifs and fhe weak local 
feafure size wlfs. Sparsifying fhe inpuf wifh respecf fo fhis lean feafure size allows fhe dafa fo be decimafed 
af leasf fo fhe level of Ifs, buf af fhe same fime keeps if dense enough wifh respecf fo fhe weak local feafure 
size, which evenfually leads fo fopological fidelify. This roughly means fhaf fhe dafa is sparsified adaptively 
as much as possible wifhouf sacrificing fhe fopological informalion (see experimenfal resulfs in Figure [T]). 

The sparsified poinfs are connecfed in a Rips-like complex using fhe lean feafure size compufed for 
each sample poinf. Following fhe approach in IfTTl . fhe guaranfee for fopological fidelify is obfained by 
inferleaving fhe union of a sef of balls wifh fhe offsefs of fhe manifold. To accounf for fhe adapfivify of fhe 
sample densify, fhese offsefs are scaled appropriafely by fhe lean feafure size and fhe approach in ifTTI is 
adapfed fo fhis framework. To fhe besf of our knowledge, fhis is fhe firsf sparsification sfrafegy fhaf handles 
adapfive inpuf samples, produces an adaptive as well as a locally uniform sparsified sample, and infers 
homology wifhouf requiring a fhreshold paramefer. 

2 Sparsification 

Lef X be a smoofh compacf manifold embedded in a /c-dimensional ambienf Euclidean space M^. Our goal 
is fo sparsify a dense and possibly adapfive sample of X and sfill be able fo recover homological information 
of X from if. 

Distance function, feature size, and sample density. Let d{x, A) denote the distance between a point 
X G and its closest point in a compact set A C M^. Consider the distance function dx ■ ^ 

defined as dx{x) = d{x,X). Let n(x) = {y G X | d{x,y) = d{x,X)} be the set of closest points of 
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Figure 1: Sparsification: Original samples of 126500 and 101529 points of MotherChild and BOTIJO 
are decimated to 6016 and 8622 points respectively. Betti numbers are computed correctly by our algorithm 
(Section]^. The rightmost picture shows a 3D curve sample (top) and the lean set (bottom) approximating 
a relevant subset of the medial axis which otherwise spans a much larger subspace of 


a: G in X. Notice that, for any y G n(x), the segment xy is contained in the normal space N^; of X at 
X. The medial axis M of X is the closure of the set of points with at least two closest points in X, and thus 
M ;= closure {m G | |n(m)| > 2}. 

The local feature size at a point x G X, denoted by Ifs(x), is defined as the smallest distance between 
X and the medial axis M; that is, Ifs(x) = d{x,M) fT]. There is another feature size definition that is 
particularly useful for inferring homological information ifTOll . This feature size is defined as fhe disfance 
fo fhe critical points of the distance function dx, which is not differentiable everywhere. However, one can 
still define fhe following vector which extends the concept of gradient to dx ifTSl . Specifically, given any 
poinf X G \ X, lef c(x) be fhe cenfer of fhe unique minimal enclosing ball Bx enclosing n(x). Define 
the gradient vector at x: Vd{x) = ^d(xxj critical points C := {x G | Vd(x) = 0}. The weak 

local feature size at a point x G X, denoted by wlfs(x), is defined as wlfs(x) = d{x, C). Given an e-dense 
sample w.r.f. fhe Ifs which is known as fhe e-sample in fhe liferafure HU, we would like fo sparsify if fo a 
locally uniform sample w.r.f. some funcfion, ideally Ifs, or wlfs. This mofivafes fhe following definifion. 

Definition 2.1 A discrete sample P C is called c-dense w.r.t. a function (/> : X — ?■ M /fVx G X, 
d{x, P) < c ■ fix). It is c-sparse if each pair of distinct points p,q G P satisfies dip, q) > c ■ fip). The 
sample P is called (ci, C 2 )-uniform w.r.t. f if it is ci-dense and C 2 -sparse w.r.t. f. 

To produce a (ci, C 2 )-uniform sample w.r.t. Ifs or wlfs one needs to compute Ifs or wlfs or their approxi¬ 
mations. This in turn needs the computation of at least a subset of the medial axis or its approximation. One 
option is to approximate this set using the Voronoi poles as in EJISl. This proposition faces two difficulties. 
First of all, it needs computing the Voronoi diagram in high dimensions. Second, approximating the medial 
axis may require a large number of samples when a manifold of a low co-dimension is embedded in a high 
dimensional Euclidean space. To overcome this difficulty we propose to compute a discrete set L near M 
of small cardinality which helps estimating the distance to a subset of M (See the curve sample in Figure [T] 
for an example). The set L called the lean set allows us to define an easily compufable feafure size which 
we call lean feature size. We show fhaf fhis feafure size is sandwiched befween fhe Ifs and wlfs fhereby 
enabling us fo sparsify an arbifrarily dense sample fo a (ci, C 2 )-uniform sample w.r.f. a funcfion brackefed 
by Ifs and wlfs. The consfanfs ci, C 2 are universal which ulfimafely leads fo a paramefer-free inference of 
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the homology. 

From now on, we assume that the input P is a dense sample of X in the following adaptive sense ||3. 
Each point is also equipped with a normal information as stated in Assumption |2.2| We will see later how 
this normal information can be computed. 


Assumption 2.2 The input point set P is e-dense w.r.t. \is function on a compact smooth manifold X C 
of known dimension without boundary. Also , every point p £ P has an estimated normal space Np where 


Z(Np,Np) = 


see Section 


2.2 for computations o/Npj. 


Notice that while we assume the input to be e-dense w.r.t. Ifs, we do not need to know Ifs and, locally, 
the sample can be much denser and non-uniform. Now we define the lean set with respect to which we 
define the lean feature size. 


2.1 Lean set 

Definition 2.3 A pair {p,q) G P x P is /3-good/or 0 < /3 < | if the following two conditions hold: 

1. max{Z(Np,pq), Z(Nq,pq)} < f -/3. 

2. Let V = be the midpoint of pq. The ball B{v, Cj^dfp, q)) does not contain any point of P where 
Cy 3 = g tan 

Definition 2.4 The /3-lean set Lp is defined as: 

Lp = {u| u = 2^ is the mid point ofpq where {p, q) is a fi-goodpair}. 

The (3-lean feature size is defined as Infs^(x) = d{x, Lp). 

One of our main results is the following property of the lean feature size ( recall the definition of in 
Assumption |2.2| ). 

Theorem 2.5 Let 6, /3 be two positive constants so that | > 9 > + §\/e + t'e/or a sufficiently small 

e < I sin^ 6. Then, 

1. Infsy 3 (x) < Cl • wlfs(x)/or any point x in X, 

2. Infsy 3 (p) > C 2 • lfs{p) for every point p £ P 

where ci = 1 -|- cos 6* -|- e, C 2 = ~ sin(/3 — Oe), = \ tan | are positive constants. 

The upper bound follows from Proposition [2^ which shows a stronger result that Infs/? is bounded from 
above by the distance to a subset of the medial axis characterized by an angle condition. This set also 
contains all critical points of the distance function dx- First, we establish this result. 

Definition 2.6 The 0-medial axis Mq C M ofX is defined as the set of points m £ M where there exist two 
points x,y £ Ii{m) such that Zxmy > 26. 

We will see later that the concept of d-medial axis is also used as a bridge between geometry and topology 
for our inference result. Our algorithm does not approximate Mg, but rather, approximates the distances to 
it by the the lean set. 

'We note that Np and Np here are subspaces of R*. The angle between them refers to the smallest non-zero principle angle 
between these two subspaces as used in the literature. 
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Proposition 2.7 Let 6, /3 be two positive constants so that f > 0 > /3 + §\/e + i^efor a sufficiently small 
e < I sin^ 9. Let x be any point in X. Then, Infs^(x) = d{x, Lp) < c • d{x, Mg) where c = 1 + cos 6 + e is 
a positive constant. 

Proof: Let m = argmin d{x, Mg). By definition, we have a pair of points s, f in the manifold X so that the 
line segments sm and tm subtends an angle larger than or equal to 29 and both sm and tm are normal to X 
at s and t respectively. Let p £ P and q £ P be the nearest sample points to s and t respectively. By the 
e-sampling condition of P, we have that d{p, s) < elfs(s) and thus /(N^, Np) < e. 

In Appendix A we show that the pair {p, q) is /3-good, hence its midpoint belongs to L^. Notice that 
max{lfs(s), lfs(fj} < d{s,m) = d{t,m), and due to the e-sampling condition, d{^^, ^) < ed{s,m). 
We then have: 

d{^^-^,'rn) < d{^-^,m) + d{?-^, < {cos9 + £)d{s,m)] 

=> d{x, Lp) < d{x, ^ ~^^ ) < d{x, m) -|- d{m, ^ ~^^ ) < d{x, m) + d{s, m)(cos 9 + e). (1) 

Since s is a closest point of m in X, we have d{s, m) = d{m, X) < d{x, m). Combining this with Eqn Q, 
it follows that 

d{x, Ljs) < (1 -|- cos 9 + e) ■ d{x, m). 

■ 

We bound the distance d{x,Mg) with wlfs(x) by observing the following. The critical points of a 
distance function d : —)• M can be characterized by points x £ that have the zero gradient Vd along 
every unit vector originating at x; see Grove |[T6ll . It is also known that the critical points of the distance 
function dx lie in the medial axis M. They are points m £ M so that the convex hull Conv (n(m)) of all 
nearest neighbors of m in X contains m. This means that there exists a pair of points x, y in n(m) so that 
the angle Zxmy is large. We use this angle condition to avoid the critical points. Specifically, we show fhe 
following resulf for manifolds of arbifrary codimension which helps fo make fhe angle condition precise. 

Proposition 2.8 Let the ambient dimension k > 1 and m £ M be a critical point of the distance function 
dx- There exists a pair of points x,y £ n(m) so that Zxmy > 

Proof: If is known fhaf any crifical poinf m of fhe disfance funclion dx is in fhe convex hull C = Conv n(m) 
of fhe poinfs in n(m). This convex hull C is a y-polylope for some j < k. We can assume lhal j is al leasl 
2, because olherwise, C is an edge wilh endpoinls say x,y £ Yi{m), and Zxmy = tt > |. 

Now consider fhe subspace C thal conlains fhe j-polyfope C. Choose an arbifrary 2-flal H 
passing Ihrough m in fhis ML The infersecfion of H and C is a polygon fhaf confains m. There is al leasl a 
pair of verfices u, v of fhis polygon so fhaf vr > Zumv > |. The verfices u and v are fhe intersection of fhe 
2-fial wilh fhe Iwo codimension-2 faces U and V o^C respectively which are {j — 2)-faces. 

Lei e be fhe maximal line segmenl conlained in U lhal connects u 
and a vertex of U. We can show that, one can choose an endpoint, say 
X, of e so that the angle Zumv remains at least | when u assumes 
the position of x. To see this consider the plane L spanned by the 
line of e and the point m (see figure on fhe righf). Lei t be fhe line 
perpendicular lo fhe orthogonal projeclion of mv. Observe lhal all 
poinls z £ e makes an angle Zzmv of al leasl | if z lies in Ihe halfplane of L delimited by t which does 
nol conlain fhe projection of mv. Then, one of fhe endpoinls of e musl satisfy fhis condition because u £ e 
does so lo ensure Zumv > 

The chosen endpoinl x of e is eilher a vertex of C or a poinf in a lower dimensional face of U. Keeping 
u al X, we can lei v coincide wilh a similar endpoinl of a line segmenl in V while keeping Ihe angle Zumv 
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at least Therefore, continuing this process, u and v either reach a vertex of C or a lower dimensional 
face. It follows that both will reach a vertex of C eventually while keeping the angle Z.umv > |. These 
two vertices qualify for x and y in the proposition. ■ 

Remark 2.9 ITc remark that the above bound of | can be further tightened with a term depending on the 
dimension k. However, the bound of ^ suffices for our results. 

The following assertion is now immediate. 


Proposition 2.10 For Q <\, every point x G X satisfies d{x, Me) < wlfs(x). 

Next, we 

show the lower bound. 


Propositions |2.7| and |2. 10| together proves the upper bound of the Infs/j claimed in Theorem 2.5 


Proposition 2.11 For every sample point p G P, we have lnfs/ 3 (p) > C 2 • Ifs(p) where C 2 = 

Co = sin(/3 - Os). 

Proof: Let z be the nearest point to p in Lp, and {p', q') the /3-good pair that gives rise to z (thus z is the 
midpoint of p'q')- By definition of a/3-good pair, Z(Np/,p'g')) < ^—/3 and hence Z(Np/,p'q') < 

There is a medial ball B tangent to the manifold X at p' so that the half line p'o going through the center o 
of this ball B realizes the angle ZfNpi ,p'q'). Hence, Zop'q' < | — /3 + It follows that 


1 TT 

d{p', z) = i^d{p\ q') > d{p\ o) cos(- - /3 -f z/e) > cq • Ifs(p'); where cq = sin(^ - vf). (2) 

The empty ball condition of the /3-good pair means that 2cpd{p', z) < d{p, z), that is, d{p\ z) < It 

then follows that 

d{p,p') < d{p, z) -h d{p , z) <{1 + ^)d{p, z). 

Icy 

By the 1-Lipschitz property of the Ifs function and Eqn (|^, we have: 

Ifs(p) < Ifs(p') -h d{p,p) < Ifs(p') + {1 + ^)d{p,z) < —d{p',z) -h (1 -h ^)d{p,z) 

Icy Co Icy 

< w^d{p, z) + {l + :^)d{p, z) = il + ^ + -^) • dip, z). 

ICoCy Icy Icy ICQCy 

Setting C 2 = i ^ —i— = we have that d{p, z) = Infs^(p) > C 2 • Ifs(p), which proves the 

proposition. ■ 

We will see later that, /3 is fixed at a constant value of |. For this choice of /3, C 2 is not unusually small. 


2.2 Computations for sparsification 


In this section we describe the algorithm Lean that takes a standard e-dense sample P w.r.t. Ifs of a hidden 
manifold X C M*' of known intrinsic dimension, and outputs a sparsified set Q F P. The set Q is both 
adaptive and locally uniform as stated afterward in Theorem 2.12 The parameter p is chosen later to be a 
fixed consfanf less fhan 1. 

The sparsificafion is based on fhe lean sef Ly, which is compufed in lines 2-4 of fhe algorifhm. We nofe 
thaf checking whefher a pair (p, q) is /3-good or nol requires no paramefer ofher fhan j3, which is sef fo a 
fixed consfanf | lafer in fhe homology inference algorifhm. Clearly, \Ly\ = 0{\P\‘^) (see Secfion 


2.3 


for 

improving \Ly \ fo 0(|P|)). There is one implemenfafion defail which involves fhe esfimafion of fhe normal 
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Algorithm 1 Lean(P, /?, p) 

1: Lp := 0; 

2: for every pair {p,q) £ P x P do 

3: if {p, q) is a /3-good pair then Lp := Lp U 

4: end for 

5: Put P in a max priority queue Q with priority lnfs/ 3 (p) for p G P; 

6: while Q not empty do 

7: q :=extract-max(Q); Q := QU {g}; 

8: delete any p from Q if d{q, p) < /9lnfs/3(q) 

9: end while 


space Np for every point p £ P. This estimation step is oblivious to any parameter but requires the intrinsic 
dimension s of X to be known. 

We estimate the tangent space Tp (thus the normal space) of X at a point p £ P as follows. Let s be the 
intrinsic dimension of the manifold X. Let pi G P be the nearest neighbor of p in P \ {p}. Suppose we have 
already obtained points ai = {p, pi,..., p*} with i < s. Let aff (uj) denote the affine hull of the points in cTj. 
Next, we choose pj+i G P that is closest to p among all points forming an angle within the range [f — f j f ] 
with aff((Tj). We add pj+i to the set and obtain dj+i = {p,pi,... ,pi,pi+i}. This process is repeated until 
i + 1 = s, the dimension of X, at which point we have obtained s + 1 points Gg = {p,Pi, • • ■ ,Ps}- We use 
aff((Ts) to approximate the tangent space Tp. It turns out that the simplex Gg obtained this way has good 
thickness property, which by Corollary 2.6 in ||4l implies that the angle between the tangent space and the 
estimated tangent space at p (thus also the angle between the normal space and the estimated normal space 
at p) is bounded by 0(e). The big-O hides terms depending only on the intrinsic property of the manifold. 
See Appendix 1^ for details. In other words, we have that the error i/e in the estimated normal spaces (as 
required in Assumption |2.2| ) is 0(e). 

Next, we put the points in P in a priority queue and process them in the non-decreasing order of their 
distances to Lp. We iteratively remove the point q with maximum value of d{q,Lp) from the queue and 
proceed as follows. We put q into the sparse set Q and delete any point from the queue that lies at a distance 
of at most plnfs/ 3 (q) from q. Since we consider points in non-decreasing order of their distances to Lp, no 
earlier point that is already in the sparse set Q can be deleted by this process. 

Determining if a pair (p, q) is /3-good takes 0{\P\) time. This linear complexity is mainly due to the 
range queries for balls required for testing the ‘empty ball’ condition 2 for /3-goodness. Therefore, for Lp = 
0(|Pp), the algorithm spends 0(|Pp) time in total. This can be slightly improved to 0(|Pp“ fc 1^1) 

using general spherical range query data structure in the ambient space |[T1. Once the lean set is com¬ 
puted, the computation of Infs for all points involves computing the nearest neighbor in for each point 
p £ P. Using the method described in section 2.3 we can bring down the lean set size to 0(|P|). Then, 
computing lnfs /3 takes at most 0(|P|^) time in total. The actual sparsification in steps 6-9 takes only 
O(IQP) = 0(|Pp)time. 

We show that the decimation by Lean leaves the point set Q locally uniform w.r.t. Infs^. The proof 
appears in Appendix [A| 


Theorem 2.12 Let P be a sample of a manifold X C M*', which is e-dense w.r.t. Ifs. For p < the output 
of Lean{P, /3, p) is a (|p, p)-uniform sample ofX w.r.t. Infsp when e > 0 is sufficiently small. 


1 







2.3 Linear-size Lean Set 


Observe that, the size \L^\ is O(n^) if the input sample P has size n. This is far less than 
k being the ambient dimension, which one incurs if the medial axis is approximated with the Voronoi 
diagrams ||9jO. We can further thin down the lean set to a linear size 0{n) for any fixed k by the following 
simple strategy: 

For every p £ P, among all /3-good pairs (p, q) it forms, we choose the pair (p, q* ) such that the distance 
d{p, q*) is the smallest. We call this pair (p, q*) the minimal 13-good pair for p. We now take a reduced lean 
set, denoted by L^, as the collection of midpoints of these minimal /3-good pairs. Obviously, \Lp\ = 0{n). 

Below we show that this reduced lean set can replace the original lean set L^: it only worsens the 
distance from a sample point to the lean set by an additional constant factor. Note that this is the only 
distance in the end required by the algorithm (and the homology inference in Theorem 3.10 1 . In particular, 
we have the following result. 


Lemma 2.13 For any point p £ P, we have that liafsi 3 {p) < d{p,Lis) < (1 -|- ^)lnfs/ 3 (p). 

Proof: The left inequality is trivial since C L^. We will show the right inequality. Fix any sample point 
p £ P, and let m £ Lf^, the midpoint of a /3-good pair (s, t), be p’s nearest neighbor in the original lean set 
Lp. 

Let {s,t*) be the minimal /3-good pair for s, and m* its midpoint. We now show that d{p,Lif) < 
d{p, m*) < (1 -I- ^)fi(p, Lp). Indeed, since (s, t*) is the minimal /3-good pair for s, we have that d{s, t) > 
d{s, t*). Hence 


d{m, m*) < d{m, s) -|- d{s, m*) < -(d(s, t) + d{s, t*)) < d{s, t). 

At the same time, by the empty-ball property of a /3-good pair, we have that d{p, m) > Cj 3 d{s, f); that is, 
d{s, t) < ^d{p, m). Putting everything together, we obtain: 
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dip, Lr) < dip, m*) < dtp, m) -\- dim, m*) < dip, m) + dis, f) < (1 H- )dip, m) = (1 H- )dip, Lr). 

C0 Ci3 

The claim then follows. ■ 


3 Homology inference 

In this section, we aim to infer homology groups of a hidden manifold X from its point samples. Let 
Hj(-) denote the /-dimensional homology group. It refers to the singular homology when the argument is a 
manifold or a compact set, and to the simplicial homology when it is a simplicial complex. All homology 
groups in this paper are assumed to be defined over the finite field Z 2 . For details on homology groups, see 
e.g. GH. 

The homology inference from a point sample of a hidden manifold X has been researched extensively 
in the literature im [m [H | 20 l. However, most of these work assume that the given sample P C X is 
globally dense, that is, e-dense w.r.t. to the infimum of Ifs or wlfs. This strong assumption allows to infer the 
homology from an appropriate offset of P w.r.t. the distance d{x, P), which is represented with the union 
of balls of equal radii around the sample points. As we indicated in the introduction, unfortunately, when 
the sample is adaptive (e-dense w.r.t. a non-constant function </>), there may not be such choice of a global 
radius so that the offset captures the topology of X. 

To circumvent this problem, one needs to scale the distance with the function f that provides the adap¬ 
tivity. This idea was used in | 8 ]] where f is taken as Ifs. Approximating Ifs is difficult, so we use lnfs /3 



instead for scaling. Observe that the offset may intersect the medial axis, but we argue that we can compute 
relevant offsets that never contains the critical points of the scaled distance, thereby ensuring topological 
fidelity. 

3.1 Scaled distance and its offsets 

In what follows we develop the results in more generality by scaling the distance dx with the distance to a 
finite set L C M^. Later, in computations, we replace L by the lean set L| and the distance d{x, L) with 
lnfs| for X G X. Recall that n(x) denotes the set of closest neighbors of x G in X. 

Definition 3.1 Given a finite set L C such that L n X = 0, Let hi : —)■ M a scaled distance to the 

manifold where 

d{x,X) ^ d{x,U{x)) 

^ d{x,X) + d{x, L) d(x, n(x)) + d(x, L) 

We avoid the obvious choice of /il(x) = because that makes /il(x) unbounded at L. We are 

interested in analyzing the topology of the a-offsets Xq, = hf^[0,a] oi hi (clearly, Xq = X since LnX = 0) 
when Xq \ X does not include any critical points of h^. This brings us to the concept of flow induced by 
the distance function which was studied in lIT^ and later used in the context of sampling theory llOl fTTlfT^ . 
The vector field as we defined earlier is not continuous. However, as it is shown in ifTSl . there exists 
a continuous flow F : \ X x M+ —)■ \ X such thaf F(x, t) = x + /q Vrf^(F(x, T))dT. For a point 

X G \ X, the image F(x, [0, f]) of an interval [0, t] is called itsyfow line. For a point x ^ X U M, where 
M is the medial axis of X, the flowline F(x, [0, oo]) first coincides with the line segment xn(x) which is 
normal to the manifold X. Once it reaches the medial axis M, it stays in M. We show that hi increases 
along the flow line of dx in the a-offset that we are interested in. This, in turn, implies that the a-offset of 
our interest avoids the critical points of hi- 

Proposition 3.2 For 6 < a < and Mq n Xq = 0, the function hi increases along the flow line 

on the piece Xq n F(x, [0, oo)) where x is any point in Xq \ X. 

Proof: First, observe that, due to Proposition |2^ we can assert that Xq \ X contains no critical point of dx 
since Xq n Mq = 0 and 9 < j. Therefore, flow lines for every point x G Xq \ X are (topological) segments. 
Consider an arbitrary point y = F(x, t) such that y G Xq. Set d = d{y, X) and d = d{y, L). Since y G Xq, 
we have 

, , , d 1 — a 

hhiy) < a - > -. (3) 

d a 

For arbitrary small At > 0, let Ad and Ad denote the changes in the distances d and d respectively when 
we move on the flow line from y = F(x, t) to y' = F(x, t + Af). Observe that by the triangle inequality. 
Ad = |d(y, L) — d{y', L)\ < d{y, y'). We claim that Ad > d(y, y') ■ cos If where f is the maximum angle 
so that any point of Xq n M belongs to M^. 
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The flow line F(x, [0, oo)) follows a direction that is normal to the man¬ 
ifold X when it does not lie in the medial axis M of X. If y lies on a portion 
of the flow line which is normal to the manifold X, then it is easy to see 
that Ad = \d{y,X) — d(y',X)| = d{y,y') > d{y,y') ■ cos2(/). If y lies 
on a portion of the flow line which is contained in the medial axis M, then 
the definition of cj) implies that, for any two points zi,Z 2 G n(y), the angle 
^ziyz 2 < 2(f). At the same time, it is known that if y G M, then the flow 
direction Vd^iv) ^1 2 / = F(x,t) points in the direction of where o is 
the center of the minimum enclosing ball for n(y) (see e.g, dl). In fact, 
o must be contained in the convex hull of points in n(y). This further leads to that there exists a pair of 
points zi, Z 2 G n(y) so that the angle between d^ and for any 2 G n(y) is at most the angle Zziyz 2 , 
which is at most 2(/i. See the figure for an illustration where n(y) = {z, zi, Z 2 , ^ 3 }, and is the inter¬ 
section of the tangent space of X at with the plane spanned by o, y, z. Hence, in the limit as y' y, 
Ad —)• d{y, y') • cos Zoyz for some z G n(?/), implying Ad > d{y, y') ■ cos{2(f)). 

Finally, note that in the claim, we require that Mg n Xq, = 0. By definition of cj), this means that 6 > cj). 

< 1 — 1 , . The condition a < , ^ now provides that 



Hence, for 0 < j, 
1 


COS 26 
4’ 1+COS26I 


= 1 - 


1 


_<1^. It follows that: 


1+COS20 — 


l+cos 2 ( 1 ) ‘ 


Ad 1 1 — a d d + Ad d 

— < - < - < - =► - < - 

Ad cos 2(/> a d d + Ad d 


/iL(F(x, t)) < /iL(F(x, t + At)). 


Now, we will show that the a-offset Xq remains homotopy equivalent to X if a is chosen appropriately. 
For the standard distance function dx, such a result is well known ||8l|T0l. Here, we need the result for the 
scaled distance hi which we establish using Proposition [3^ and the critical point theory of Grove fT6l . The 
isotopy lemma of Grove lIT^ provides the partial result that Xq, is homotopy equivalent to a smaller offset 
Xq/, a' < a. Then we argue that Xq/ is homotopy equivalent to X when a' is sufficiently small. 


cos 29 


Let Xq be as defined in proposition 3.2 where Xq n Mg = 


Proposition 3.3 Let 9 <\ and a < 

Then, Xq is homotopy equivalent to X and hence Flj(XQ) = Hj(X)/or each dimension i >0. 


Proof: Consider a real a' where 0 < a' < a. Let B = Closure (Xq \Xq). Any point x G H has a flow line 
F(x, [0, t]) along which hi strictly increases (Proposition |3.2[ ). In particular, there is a unit vector originating 
at X along which does not vanish. Therefore, B does not contain any critical point of Iil- Applying 
the isotopy lemma of Grove lIT^ . we conclude that B deformation retracts to the bounding hypersurface 
h~[^{a') of Xq. The resulting homotopy equivalence can be extended to a map r : Xq = H U Xq/ —)• 
h~[^{a') U Xq/ = Xq/ by restricting r to identity on Xq/. It follows that r is a homotopy equivalence. 

For any point x G Xq/ \ X, a flow line F(x, [0,f]) cannot re-enter Xq/ once it exits because of the 
monotonicity of h^. This means F(x, [0, t]) intersects Xq/ in one connected segment. Let x' be the unique 
point where F(x, [0, f]) intersects the hypersurface hf^{a'). Since X is compact and smooth, by choosing 
a' > 0 sufficiently small, one can ensure that F(x, [0, f]) n Xq/ lies on the normal line segment xn(x), for 
all X G Xq/ \ X. It implies that Xq/ intersects the normal lines to X in a connected segment along which Xq/ 
can be retracted to X completing the proof. ■ 


3.2 Interleaving and inference 

Our goal is to interleave the a-offsets of with the union of a set of balls L)B centered at the sample points 
because then, following the approach in ifTTI . we can relate the topology of the nerve complex of UB with 
that of X. For the distance function dx, the offsets restricted to the sample P provide the required set of balls 
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because dx|p approximates dx- Unfortunately, offsets of Hl restricted to P are not necessarily union of 
geometric balls centering points in P. Nevertheless, we show that a set of balls whose radii are proportional 
to the distances to L have the necessary property. 

First, we consider the union of balls, one for every point in X. Let VJBa denote the union of balls B{x, r) 
for every x G X where r = ad{x, L). One has the following interleaving result. 


Proposition 3.4 X_^ C UBa U X^. 

Proof: First we show the left inclusion. Let x be any point in X_^, and y an arbitrary point from n(x) 
(i.e, d{x, y) = d{x, n(x))). Then we have, 

d{x,y) ^ _ d{x,y) _ 

2d{x, y) + d{y, L) d{x, y) + (d(x, y) + d{y, L)) 

d(x,y) a 

< -r-— z— = -— Since X G X 

d{x, y) + d{x, L) 1 + 2a 


It then follows that 


(1 + 2a)d{x, y) < 2ad{x, y) + ad{y, L) d{x, y) < ad{y, L) x G UBa- 


We now prove the second inclusion. Let x be any point in Ui?a- Let z G X be a point so that x G 
B{z, ad{z, L)); that is, d{x, z) < ad{z, L). Such a point exists by the definition of UBa- Using triangle 
inequality, we have: 


hiix) 


d{x,X) ^ d{x,z) ^ d{x,z) 
d{x,X) + d{x, L) ~ d{x, z) + d{x, L) ~ d{z,L) 


ad{z, L) 
~ d{z,L) 


We extend the above interleaving result to the union of balls whose centers are restricted only to a sample 
P C X. For convenience we define fhe following sampling condifion closely related the e-dense sampling 
condition. 


Definition 3.5 A finite set P C X is a (J, L)-sample ofX if every point x G X has a point p € P so that 
d{x,p) < 6d{p, L). Furthermore, let UPq, = Upgpi?(p, ad{p, L)) denote the union of scaled balls around 
sample points in P. 

Remark 3.6 A 6-dense sample w.r.t. lnfs /3 is also a Lp)-sample ofX. Conversely, a (5, Lj 3 )-sample 
of X is also a j^-dense sample w.r.t. Infs^. These follow from the fact that Infs^g is 1-Lipschitz. 


Proposition 3.7 For a {5, L)-sample P ofX and any a > 0, we have C UPa+s+a5 U Xq,+ 5 _|_q 5 . 


Proof: Recall that by definition USq = \Jx^xB{x, ad{x, L)). By the (5, L)-sampling condition of P, as 
well as triangle inequality, we have CBa C UPa+<5+a(5- Combining this with the left inclusion in Proposition 


3.4 we have 


- UF’a+<5+a(5- 


l + 2a 


The second inclusion follows because C UPa+(5+o<5 CBa+s+aS U Xc^+^+Qi (Proposi¬ 

tion!: 


3.4i. 


With the isomorphisms in the homology groups of the offset of our scaled distance function (Proposition 
3.31 and the interleaving result (Proposition [T^, we can infer the homology of the hidden manifold X from 


the union of balls UP„- 


11 













Suppose that P is a (<5, L)-sample of the manifold X. Recall that UPq, denotes the union of balls 
Upgp B{p, ad{p, L)) centered at each point p ^ P, with radius ad{p, L). Note that the parameter a does 
not stand for distance threshold, but a scale parameter for the distance d{p, L). This parameter is universal 
for all points, while the distance d{p, L) makes the union of balls adaptive. 

one obtains that, for a + (5 < 4 and a' = 


By manipulating the result in Proposition 


3.7 


2(l-o)’ 


X| C [jPai+S+a'S C UP5^,_,_5 C UPa+S 

When a + (5 < g and a' = similar manipulation gives 

Xa+5 C UPa'+5+a'5 ^ C C UP3(^a+S) 

So, for a + (5 < g, we obtain 

X| C UPa+5 P y^a+5 P UP3(o+5) C X^t^^+S) 

which leads to inclusion-induced homomorphisms at the homology level that interleave: 


(4) 


Hi(X|) —^ Hi(uPQ+5) —)• Hj(XQ,+5) —> Hj(uP3(Q_|_5)) —> Hj(X3(Q,p5)) 


On the other hand, if 3(a + 5) < X 3 (q,_|_ 5 ) H Mg = 0, we can use Proposition 

3.2 in lim to claim that 

image (Hi(UP„+ 5 ) ^ Hi(UP 3 („+ 5 ))) ^ Hi(X). 

Let C'“(P) denote the nerve of UP^. One can recognize the resemblance between C"(P) and the well- 
known Cech complex. Both are nerves of unions of closed balls, but unlike Cech complexes, C“(P) is the 
nerve of a union of balls that may have different radii; recall that a denotes a fraction relative to a distance 
rather than an absolute distance. The Nerve Lemma |'5]| provides that C'"(P) is homotopy equivalent to 
UPo. Also, the argument of Chazal and Oudot lUTl to prove Theorem 3.5 can be extended to claim that for 
any i > 0 , 


3.3 


and Lemma 


rank(Hi(C7“+^(P)) ^ Hi(C3(“+'5)(P)) = rank (Hi(UP„+ 5 ) ^ Hi(UP 3 („+ 5 ))) = rankHi(X). 


The complex C°^{P) interleaves with another complex R^{P) that is reminiscent of the interleaving of 
the Cech with the Vietoris-Rips complexes. Specifically, let 


R°'{P) := {a I d{p, q) < a{d{p, L) + d{q, L)) for every edge pq of a}. 

It is easy to observe that P"(P) is the completion of the 1-skeleton of C°^{P) and the following inclusions 
hold as in the case of the original Cech and Vietoris-Rips complexes. 

C^{P) C P“(P) C C^^{P) for any a > 0. 

Now, by choosing a + 5 < g i+cos 20 (which also implies a + 5 < ^ since i^°os 2 e — 5 )’ we have a 
sequence similar to Q that eventually induces the following sequence: 

Hi(C“+‘^(P)) ^ Hi(P"+'^(P)) ^ Hi(C2(“+'^)(P)) ^ Hi(C®("+^)(P)) ^ Hi(P®(“+^)(P)) ^ 


In particular, following a similar argument as before, we have that 

rank(Hi(C"+'^(P)) ^ = rank (H,(C72(“+'5)(P)) ^ Hi(C^("+'5((P)) = rankHi(X) 

as long as 12 (q! + <5) < i.^°os 26 > Xi 2 (a-i- 5 ) Cl Mg = 0. By using the standard results of interleaving ifTll 

on this sequence, we obtain that 

rank(Hi(P“+'^(P)) ^ Hi(P®("+^)(P))) = rankHi(X). 
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Theorem 3.8 For a finite set L C where L n X = 0, let P be a (5, L)-sample of the manifold X C 
Let 0 < f, and a + 6 < If'^i 2 (a+S) n = 0, then rank(Hj(X)) = rank(Hi(i?"+‘^(P)) —)> 

Hi{R^i^+^){P))),forany i > 0. 

3.3 Computations for topology inference 


Algorithm 2 LeanTopo(P) 

l:/3:=f; Q := Lean{P, , pfi 

2: Compute the complexes E?p{Q) and 

3: Compute the persistence induced by the inclusion P?p{Q) —)• R^‘^p{Q). 


In step 3 of LeanTopo, we compute the persistence homology induced by the inclusion R?f’{Q) —)■ 
R^‘^P{Q) where p = ^ i+Tos 2 / 3 • When the parameter e is sufficiently small and /3 = ?, we can find a value 


26 H-cos2/3 

9 such fhaf f > 0 > /? + | 
for 1 


£ + and 2p = 


1 cos 2/3 


< 


1 cos 29 


. This is precisely whaf is needed 


13 1+cos2^ — 12 1+cos2e- 

le homology inference in Theorem 3.8 More specifically, recall by Eqn. [^in fhe proof of Theorem 


2.12 fhe oufpuf sparsified sef of poinfs Q is a (J, L|)-sample for 6 = |/?. The algorifhm implicifly sefs 
a = 2p — 5 = ip such fhaf a + 5 = 2p < A , when e is sufficienlly small. Theorem 


3.8 


requu'es 

furlher fhaf fhe offsel Xq,/ := [0, a'] is disjoin! from Mg for a' = 12(a + 5) which we esfablish using 

f 


the following proposition. 


Proposition 3.9 Let a' < ^_)_^Qgg_)_g and 9 be such that f > 0 > f + |\/e + r'efor a sufficiently small 
e < I sin^ 9. Then, Mg n Xq,/ = 0. 


Proof: We prove the result by contradiction. Assume that there exists a point x G Mg n X^' ■ Define 
m and s as in fhe proof of Proposition |2.7[ Wifh 13 = ^, fhe assumed conditions for 9, f3, e are same as in 
Proposition |2.7[ and fhus we can arrive af fhe inequalify[T]in ifs proof. Since s is a closes! poinf of m in X, 
we have d{s,m) = d{m,X) < d{x,m) + d{x,X). Combining fhis wifh Eqn Q, if follows fhaf, for any 
X G Xq,/, 

d{x, L|) < (1 + cos 9 + e) ■ d{x, m) + (cos 9 + e) ■ d{x, X). 


Since x G X^/, hi-n (x) < a' implies fhaf d{x,X) < jf^d{x, Lk). Hence d{x,LiL) < c ■ d{x,m) = 
c • d(x, Mg) for fhe positive consfanf c = — i+yose+g — = 1 + -— cosg+e 

y ’ n i__^(cos0+£) l-a'(l+cos6»+£) 

On fhe ofher hand, since x G MgCiXa' and since Xn Mg is empfy, x fiX. Thus, d{x, X) > 0. Since x G 
Xq/ and hi,^ (x) < a', we have fhaf d{x,LiL) > L=f^d{x,X). Hence d{x,LiL) > 0 as well since a' < 1. 

This furlher implies fhaf d{x, Mg) > 0 because according fo fhe above derivation, d{x, Mg) > \d{x, L-n) 
for c > 0. This however conlradicls fhe facl fhaf x G Mg n X„/ G Mg. Hence our assumplion is wrong and 
Ihere is no such poinf x G Mg n X„/ . ■ 


Theorem 3.10 Let X be a smooth compact manifold without boundary of known intrinsic dimension. 
Let P be an e-dense sample ofX w.r.t. Ifs. EeanTOPO(P) computes the rank of \-\i{X) for any i > 0 when 
£ is sufficiently small. 

Proof: Since ]^^As 26 > — i+cos6>+£ for ^ ^ f ^rid small enough e, one has fhe facf fhaf a' = 12(a + 
5) < ]^^°og 26 > implies ^ i+cose+£ ' means fhaf fhe parameters a, and 9 sef by fhe algorifhm 

EeanTopo implicifly or explicifly satisfy fhe conditions required by Proposition |3.9[ Hence, Xi 2 (q-+ 5 ) H 
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Me = 0. Therefore, all conditions for Theoremhold for the sparsified set Q output by Lean, and it then 
follows that rank(Hj(X)) = rank(Hj(i?^^(Q)) —for any z > 0. ■ 

We remark that a particular interesting feature of Algorithm LeanTopo is that, we only need to set 
the parameter /3 to a universal constant |. All other parameters such as the angle and radius conditions for 
choosing /3-good pairs and the decimation radius are determined by this choice of the angle /3. This makes 
LeanTopo parameter-free; see also our experimental results in Section]^ At the same time, the above 
Theorem states that its output is guaranteed to be correct as the input set of samples P becomes sufficiently 
dense. 


4 Experiments and discussion 


We experimented with LeanTopo primarily on curve and surface samples. We used thresholds for sparsi- 
fication that are more aggressive than predicted by our analysis. For example, our analysis predicts that for 
/3 = |, the constant Cfj = \ tan ^ ss 0.11, but we kept it at 0.5. We kept the same thresholds for all models 
to ensure that we don’t fine tune it for different input. The decimation ratio (i{q\p) 0-5, and the 

r for computing the complex R!' is kept at 0.65 in all cases. Table [^below shows the details. The rank of 
Hi homology is computed correctly by our algorithm for all these data. The sparsified points are shown in 
Figure [T] 


Name 

input #points 

output #points 

C/3 

decimation ratio 

r for BT 

rank Hi 

MotherChild 

126500 

5267 

0.5 

0.5 

0.7 

8 

BOTIJO 

101529 

7600 

0.5 

0.5 

0.7 

10 

Kitten 

134448 

1914 

0.5 

0.5 

0.7 

2 

CurveHelix 

1000 

235 

0.5 

0.5 

0.7 

1 


Table 1: Experiments on a curve and three surface samples. 


Extensions. One obvious question that remains open is how to extend the scope of our sparsification 
strategy to larger class of input, such as noisy data samples and/or samples from compact spaces rather than 
manifolds. 

Noise: We observe that, for Hausdorff noise, where samples are assumed to lie within a small offset of 
the manifold, our method can be applied. However, a parameter giving the extent of this Hausdorff noise 
needs to be supplied. With this parameter, one can estimate the normals reliably from the noisy but dense 
sample. The step where we compute the lean set, requires an empty ball test which also needs this parameter 
because otherwise noise can collaborate to provide a false impression that some spurious manifolds have 
been sampled. Given the ambiguity that a noisy sample can be dense for two topologically different spaces, 
it may be impossible to avoid a parameter that eliminates different such possibilities. Nevertheless, our 
method would free the user from specifying a threshold for building the complexes. 

In an experiment, we added artificial noise on fhe three surface samples as shown in Figure to test 
robustness of our algorithm. We added a uniform displacement to each sample point along the normal 
direction. The displacement ranged from —0.5% to 0.5% times the diameter of the model. We modified 
our algorithm to ignore all leanset points formed by two points closer than a threshold which is picked as a 
multiple of the diameter of the model. Other thresholds were kept the same as in the previous experiment. 
Results in Table show that the algorithm can tolerate noise in case there is a known upper limit on the 
noise level. 
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Threshold (multiple of 
noise scale) 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

MotherChild 

18196 

1636 

37 

8 

8 

8 

8 

8 

8 

8 

8 

8 

7 

7 

BOTIJO 

14565 

14580 

1462 

10 

10 

10 

10 

10 

10 

10 

10 

8 

8 

8 

Kitten 

20506 

20572 

1314 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 

2 


Table 2: Experiments on 3 surfaces with artificial noise. The table shows resulting Hi of each model under 
different threshold. Experiments show that the influence of noise is removed when we pick threshold greater 
than or equal to 3 times of the noise scale. The threshold might introduce problem when it is too large. 



Eigure 2: Noisy samples. Meshes are created only for rendering. 


The more general noise model which allows outliers would also be worthwhile to investigate. One may 
explore the ‘distance to measure’ technique proposed in fj] for this case. But, it is not clear how to adapt the 
entire development in this paper to this setting. One possibility is to eliminate all outliers first to make the 
noise only Hausdorff, and then apply the technique for Hausdorff noise as alluded in the previous paragraph. 
This will certainly require more parameters to be supplied by the user. 

Compacts: The case for compact sets is perhaps more challenging. The normal spaces are not well 
defined everywhere for such spaces. Thus, we need to devise a different strategy to compute the lean set. 
The theory of compacts developed in the context of topology inference in @ may be useful here. Computing 
the lean sets efficiently in high dimensions for compact spaces remain a formidable open problem. 
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A Missing Proofs 


Proving that {p, q) is /3-good for Proposition |2.7[ We know that /(N^, st) < 'k — Q which implies 
thatd(s,t) > 2d(s,m)sin0 > 21fs(s)sin0. Consider the triangle pst. By triangle inequality, d(p, t) > 
(i(s, t) — d{s, p) > (2 sin 6 — e)lfs(s). The angle Zpts is at most 


. d{p,s) 

arcsin —-r ^ arcsm 

d{p,t) - 


elfs(s) 


4 

< - • 


The last inequality follows from that arcsin(x) < cx for x < 
1 
2 


(2 sin0 — e)lfs(s) 3 2sin0 —e 


(5) 


/e < 01 < we have that 


. In our case, choose c = |. Since 

C ’ 6 


< 


V~e 


1 

^ < - < -• 

^7 c 


2 sin 0 — e 4:^/e — e 4 — _ 

Now assume without loss of generality that Ifs(s) > Ifs(t). Then, 

d{p, q) > d{s, t) — d{p, s) — d{q, t) > d{s, t) — 2elfs(s) > 2(sin0 — e)lfs(s). 
Recall that d{t,p) > (2sin0 — e)lfs(s). Considering the triangle tpq, we have 

dfs(f) . elfs(s) . 4 e 


/, ^ ■ d{q,t) 

Z-tvo < arcsm —-- < arcsin 

d{p,t)- 


2(sin0 — e)lfs(s) 


< arcsin 


4 

< - • 


2(sin 6 — e)lfs(s) 3 2(sin 6 — e) 


( 6 ) 


where the last inequality follows from a similar argument used for Eqn. 0- 

We know that, Z(Np,Ns) < e, Z(Np,Np) < and Z{pq,st) < Zpts + Ztpq. Combining these 
with Eqn. and the assumption that ^/e < 0^ sin0(< 0, we have that 

Z{pq, Np) < Z{pq, st) + Z{st, N^) + Z(Ns, Np) + Z(Np, Np) < ^ • 2sme-2e + + ^ + 


< - • 


-v/e -v/e 71" ^ TT „ 3 ^ 

=-1-1- 6 + Ve < - 6 -\— -v/e + i/g 

3 4^/2-2V^ 2^2 ^"-2 ^2^^" 


Similar bound holds for Z{pq, Ng). It follows that the pair (p, q) satisfies the first condition of being /3-good, 
as long as I — 0 + I y/e + r'e < f — /3. This is guaranteed by requiring 0 > /3 + | y/e + (as specified in 
fhe proposition). 

Nexf, we argue fhaf (p, q) also satisfies fhe second condition of being /3-good. 

To do so, lef 0 = ^Zsmt be half of fhe angle spanned by sm and tm. Note fhaf 
by fhe definition of 0-medial axis Mq, we have fhaf 6 > 6. See fhe righf figure 
for an illusfrafion. Eirsf, observe fhaf fhe ball D = B{^,r) wifh r = d{s, m) ■ 

(1 — cos 0) does nol infersecf X, since fhis ball is confained inside fhe medial ball 
B{m, d{s, m)). The midpoinf Zt2 of pq is af mosf elfs(s) < ed{s, m) disfance 
away from ^ because bofh p and q are af mosf elfs(s) away from s and t 
(assuming w.o.l.g Ifs(s) > Ifs(f)). This means fhaf fhe ball D' = 73(20, r') 
centering af fhe midpoinf of pq and wifh radius r' = d(s, m) • (1 — cos 0 — e) is 
confained in fhe ball D and fhus does nof have any poinf of X and hence P inside. 

On fhe ofher hand, nofe fhaf 



d{p,q) < d{s,t) + 2elfs(s) < 2d{s,m)sm6 + 2ed{s,m) = 2d{s,m){sm6 + s) 
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Thus, the second condition for p, q being a good pair is satisfied as long as 


1 — cos 9 — e 

cp < -.- < 


2(sin0 + e) d{p,q)' 

Consider the function f{x) = its derivative f'{x) is greater than 0 for x G [0,7r/2]. Indeed, 


, sinx • (sinx + e) — (1 — cosx — e) • cosx 1 — cosx + esinx + ecosx 

/ (x) =-rr-77^-=-rr-77^- > 0. 


(sinx + e)2 


(sinx + 


Hence /(x) is an increasing function, and f{9) > f{6) since 9 > 0. In other words, the second condition 
for {p,q) being a good pair is satisfied as long as cp < ■ To furfher simplify if, nofe fhaf using 

e < I sin^ 9, one can show fhaf ^ tan I. Combining fhis wifh ^ ^ = tan we fhen have 


1 — cos 9 — £ 1 — cos 9 — £ 4 9 

> - n -= - tan - — 


2(sin0 + e) 


I sin ( 


4e 4:91 91 91 P 

-- > - tan-tan - = - tan - > - tan —. 

9 sin 0-9 2 9 2 3 2“3 2 


Hence as ^ tan the ball cpd{p, q)) is contained in D' and thus contains no point in P. 

Therefore, the pair (p, q) is /3-good and its midpoint is in Lp. 

Proof of Theorem |2.12t Let x be any point in X to which p is the nearest sample point in P. Then, 
d{x,p) < dfs(x) < e'lfs(p) where P = If p is retained in Q, d{x,Q) < dfs(x) < e'lfs(p) < 

§^d{p, Lp) < ^d{p, Lp) for sufficiently small e > 0, where C 2 is the constant from Proposition 2.11 Now 
consider the case when p is deleted while processing another point, say q £ P. By the decimation procedure 
in lines 5-9, d{q, Lp) > d{p, Lp) and q will remain in Q since we process points in non-decreasing order of 
their distances to Lp. Using Proposition [2TTj we then have: 

d{x, q) < d{x, p) + d{p, q) < dfs(x) -|- pd{q, Lp) < e'lfs(p) -|- pd{q, Lp) 

d 6/9 

< —d{p, Lp) + pd{q, Lp) < {—+ p)d{q, Lp) < -^d{q, Lp). 

C2 C2 0 

The last inequality holds when £ is sufficiently small (in which case the estimation error in the normal 
space is also small). Therefore, 


d{x,q) < ylnfs/ 3 (g) 


(V) 


Now applying Remark 


3.6 




Q is also (Ip)- dense because for P < 


12 - 


The fact that Q is p-sparse w.r.t. Infsp follows easily from the decimation procedure. 


B Estimation of Normal/Tangent Space 

Here, we provide the justification for the claimed bound of 0(e) on the tangent space estimation(and thus the 
normal space) of the hidden manifold X at a sample point p £ P. For completion, we restate the procedure 
described in section 2.2 for estimating the tangent space Tp. Set /3 = | for the calculations to follow. Let 
s denote the intrinsic dimension of the manifold X, which we assume is known a-priori. Let pi G P be the 
nearest neighbor of p in P \ {p}. Suppose we have already obtained points ai = {p,pi,... ,pi} with i < s. 
Let aff(cJi) denote the affine hull of the points in ai. Next, we choose pi+i £ P that is closest to p among 
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all points forming an angle within the range — /?, |] with aff(cJi). We add to the set and obtain 
(Tj+i = {p,pi,... ,pi,pi+i}. This process is repeated until i = s, at which point we have obtained s + 1 
points (Ts = {p,pi,. ■. iPs}- We use aff((T<j) to approximate the tangent space Tp. We now show that the 
simplex ai is “fat”. In particular, we will leverage a result (Corollary 2.6) of ||4l to bound the angle between 
the true tangent space Tp and approximate tangent space afF((Ti). 

More specifically, we first modify the simplex Ui to another one di as follows. Let D denote the longest 
length of any edge incident to p in crj. Later we will prove that D = )■ Now, we extend each 

edge ppj along the same line segment but to ppj such that \\ppj\\ = D. The resulting simplex spanned by 
{p,pi, ■ ■. ,Pi} is denoted by di. By construction, afF(iTi) = aff(5j). Hence, we only need to bound the 
angle /(Tp, aff(ai)). Corollary 2.6 of |4| states that sin Z(aff(ai), Tp) < voi(g^yg^ifs(p) ’ where L and S are 
the longest and shortest edge length of di respectively; while Vol(?j) stands for the volume of the simplex 
di. To use this result, we bound the terms L, S, and Vol(aj). 

See the figure on right for an illustration. First, we bound the angle 
between any two ppi and ppj, for i, j G [1, f]- Assume w.o.l.g. that j > 1. 

By construction, ppj forms an angle a such that a G [| — /?, |] with 
aff((Tj_i). It follows that a < ^{ppi^ppj) < vr —a, that is, Z{ppi,ppj) G 
[| — /3,1 + /3]. Therefore, the edge length d{pi,pj) satisfies 

d{pe,Pj) = ‘^D-sm^Z{ppe,ppj) G [2T> • sin( J - ^), 20 • sin(^ + ^)]. 

Therefore the longest edge length L in simplex di is at most L < 2D ■ sin(| + |), while the smallest edge 
length S in simplex di is at least S > min{0, 2D ■ sin(| — |)}. 

Next, we bound the volume Vol(5i) of aj, which we do inductively. We claim that Vol(ai) > — 

This claim holds when z = 1 in which case VoI(cti) = d{p,pi) = D. Assume it holds for i — 1. Then, we 
have that Vol(CTj) = \d{pi, aS{di-i)) ■ Vol(CTi_i), where hi = d{pi, aff(CTj_i)) is the height of the simplex 
di using di-i as the base facet. On the other hand, by construction ^{ppi, afF(5i_i) >§ — /?, which gives 



hi = d{p,pi) ■ sinZ(ppj,aff(CTi_i)) > D ■ cos/3. 

It follows that Vol(ai) > ^ ' cos j3 ■ Vol(CTi_i) > ^dD-cosp) —^ which then proves the claim inductively. 
Now we derive an upper bound on D. Inductively, assume that for 1 < z < s. 


D < 13elfs(p) and 6i 


Z(aff(iTj), Tp) < arcsin 


/ i!2*+20sin*+2(| + I) \ 
ycos*“^/3sin(| — ^)lfs(p) j 


For z = 1 and sufficiently small e, it is true because the nearest point pi to p satisfies d{p, pi) < 3elfs(p) and 
also sinZ(ppi,Tp) < (this follows easily from the e-dense sampling condition, see e.g. Corollary 3.1 
and Lemma 3.4 lfT4l ) For induction consider the time when we choose pi. Consider the projection di-i of 
CTj-i onto Tp and the (z — 1)-dimensional affine subspace afF(a'j_i) of Tp containing this projection. By our 
inductive hypothesis, Z(aff(Tj_i, affd-j-i) < 9i-i. Let F be the subspace of Tp orthogonal to aSdi-i and 
let a: G T be such that d{x,p) = lOdfs(p). The closest point x G X of x to X has d{x, x) = 0(e^lfs(p)). 
Therefore, we can assume that 

9elfs(p) < d{p,x) < llelfs(p) 

when e > 0 is sufficiently small. There is a sample point p' £ P with d{x,p') < elfs(x). This means 
that the angle Zpx,pp' is at most arcsin(^^^) = arcsin | when e is sufficiently small. It follows that 
Z{pp', aff(crj_i)) > f — arcsin | — 0j_i. One can make 0j_i arbitrarily small by choosing e sufficiently 
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small. Therefore, if /3 = | and e is small enough, we have Zpp', aff((Ti_i) G Since pi is chosen 

with the smallest distance from p satisfying the above angle condition, we have, for small enough e, 

d{p,Pi) < d{p,p') < d{p,x) + d{x,p') < llelfs(p) + elfs(x) < 13elfs(p). 

Since D cannot be larger than the maximum between older D from stage i — 1 and d{p,pi), one has 
D < 13elfs(p). Combining all these with Corollary 2.6 of |4|, we obtain that sin Z(aff(5*), Tp) = sin 9i as 
claimed. 

Evaluating sin0j we obtain sin^j = = 0{£) for all i G [1, s] where the big-0 notation hides 

constants depending exponentially on the intrinsic dimension s and cos /3. In other words, the angle i/g 
between the approximate tangent space and the true tangent space (thus between the approximate normal 
space and the true normal space) at any sample point is bounded by 0(e), where the big-0 notations hides 
constant depending on the angle /? and intrinsic dimension s of the manifold X. 
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